✨ Add DangerousWorkflow check for imposter commits. #2789

wlynch · 2023-03-24T20:45:38Z

What kind of change does this PR introduce?

Adds DangerousWorkflow check for imposter commits.

See https://www.chainguard.dev/unchained/what-the-fork-imposter-commits-in-github-actions-and-ci-cd

This borrows its implementation from
https://github.com/chainguard-dev/clank to look up imposter commits for
a repo.

(Is it a bug fix, feature, docs update, something else?)

PR title follows the guidelines defined in our pull request documentation

What is the current behavior?

n/a (new check)

What is the new behavior (if this is a feature change)?**

Tests for the changes have been added (for bug fixes/features)

Which issue(s) this PR fixes

Fixes #2733

Special notes for your reviewer

GitHub e2e test is failing because the Actions yaml is testing against isn't actually valid (it references non-existent repos). We should fix this, but figure get this PR out for review.

Does this PR introduce a user-facing change?

For user-facing changes, please add a concise, human-readable release note to
the release-note

(In particular, describe what changes users might need to make in their
application as a result of this pull request.)

Adds new DangerousWorkflow check for detecting imposter commits.

codecov · 2023-03-24T21:30:41Z

Codecov Report

Merging #2789 (b1b6b72) into main (b6362b1) will decrease coverage by 0.06%.
The diff coverage is 45.88%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2789      +/-   ##
==========================================
- Coverage   49.21%   49.15%   -0.06%     
==========================================
  Files         158      158              
  Lines       11967    12129     +162     
==========================================
+ Hits         5889     5962      +73     
- Misses       5709     5791      +82     
- Partials      369      376       +7

spencerschrock

Not a complete review, but wanted to get some comments out that address some big blockers with this current implementation. This change will likely require converting Dangerous-Workflow away from a file content only check:

scorecard/checks/dangerous_workflow.go

Lines 28 to 32 in eb57e04

    
           func init() { 
        
           	supportedRequestTypes := []checker.RequestType{ 
        
           		checker.FileBased, 
        
           		checker.CommitBased, 
        
           	}

I'll also post some thoughts in the main issue for this PR that are better suited there.

checks/raw/dangerous_workflow.go

spencerschrock · 2023-03-24T22:04:34Z

clients/githubrepo/client.go

+	newClient := &Client{
+		ctx:           client.ctx,
+		repoClient:    client.repoClient,
+		graphClient:   client.graphClient,
+		contributors:  client.contributors,
+		branches:      client.branches,
+		releases:      client.releases,
+		workflows:     client.workflows,
+		checkruns:     client.checkruns,
+		statuses:      client.statuses,
+		search:        client.search,
+		searchCommits: client.searchCommits,
+		webhook:       client.webhook,
+		languages:     client.languages,
+		licenses:      client.licenses,
+		tarball:       client.tarball,
+	}
+	repo, err := MakeGithubRepo(inputRepo)
+	if err != nil {
+		return nil, err
+	}
+	if err := newClient.InitRepo(repo, commitSHA, commitDepth); err != nil {


I have concerns about what this InitRepo call will do to the state of the parent client since all of the struct fields are pointers.

For example, InitRepo will call client.licenses.init(client.ctx, client.repourl) which will wipe any data seen already, reset some sync primitives, etc.

func (handler *licensesHandler) init(ctx context.Context, repourl *repoURL) { handler.ctx = ctx handler.repourl = repourl handler.errSetup = nil handler.once = new(sync.Once) handler.licenses = nil }

There are all sorts of nasty race conditions and duplicated work waiting to happen here.

Refactored this to use the same handler creation logic as CreateGithubRepoClientWithTransport. PTAL!

clients/gitlabrepo/client.go

spencerschrock · 2023-03-24T22:53:47Z

I agree with the need for a sub-RepoClient, as we're trying to answer the question "does this commit belong to repo B" but I'm not sure how I feel about the current approach. A new method like NewClient makes sense since RepoClient might be a github/gitlab/local etc, but I'm wondering if there's a way we can make the child RepoClient lighter.

If we modify RepoClient so the struct fields are optional that could help:

e.g.

if client.graphClient != nil {
    client.graphClient.init(client.ctx, client.repourl, client.commitDepth)
}

In the GitHub implementation for example, the ContainsRevision method only uses the branchesHandler handler. It does it to read repourl which is already a member of Client, so we could even eliminate the need to use branchesHandler.

spencerschrock · 2023-03-24T22:54:29Z

I think it makes sense to discuss this some more before re-implementing things.

wlynch · 2023-04-03T21:16:18Z

For the repo client - good to know that the handlers are stateful. I think we could probably get a fresh client if we just refactor some of the client initialization from here -

scorecard/clients/githubrepo/client.go

Lines 265 to 309 in b6362b1

    
           return &Client{ 
        
           	ctx:        ctx, 
        
           	repoClient: client, 
        
           	graphClient: &graphqlHandler{ 
        
           		client: graphClient, 
        
           	}, 
        
           	contributors: &contributorsHandler{ 
        
           		ghClient: client, 
        
           	}, 
        
           	branches: &branchesHandler{ 
        
           		ghClient:    client, 
        
           		graphClient: graphClient, 
        
           	}, 
        
           	releases: &releasesHandler{ 
        
           		client: client, 
        
           	}, 
        
           	workflows: &workflowsHandler{ 
        
           		client: client, 
        
           	}, 
        
           	checkruns: &checkrunsHandler{ 
        
           		client:      client, 
        
           		graphClient: graphClient, 
        
           	}, 
        
           	statuses: &statusesHandler{ 
        
           		client: client, 
        
           	}, 
        
           	search: &searchHandler{ 
        
           		ghClient: client, 
        
           	}, 
        
           	searchCommits: &searchCommitsHandler{ 
        
           		ghClient: client, 
        
           	}, 
        
           	webhook: &webhookHandler{ 
        
           		ghClient: client, 
        
           	}, 
        
           	languages: &languagesHandler{ 
        
           		ghclient: client, 
        
           	}, 
        
           	licenses: &licensesHandler{ 
        
           		ghclient: client, 
        
           	}, 
        
           	tarball: tarballHandler{ 
        
           		httpClient: httpClient, 
        
           	}, 
        
           }

The main piece we want to reuse is the authed client, which we can pick out from the client struct. I'll take a pass at it. (also open to other ideas)

See https://www.chainguard.dev/unchained/what-the-fork-imposter-commits-in-github-actions-and-ci-cd This borrows its implementation from https://github.com/chainguard-dev/clank to look up imposter commits for a repo. Signed-off-by: Billy Lynch <billy@chainguard.dev>

This limits the number of calls made instead of probing every branch. Signed-off-by: Billy Lynch <billy@chainguard.dev>

Signed-off-by: Billy Lynch <billy@chainguard.dev>

wlynch · 2023-04-03T21:29:47Z

This change will likely require converting Dangerous-Workflow away from a file content only check:

Looks like CommitBased / FileBased are the only 2 types at the moment.

scorecard/checker/check_request.go

Lines 40 to 45 in eb57e04

    
           const ( 
        
           	// FileBased request types require checks to run solely on file-content. 
        
           	FileBased RequestType = iota 
        
           	// CommitBased request types require checks to run on non-HEAD commit content. 
        
           	CommitBased 
        
           )

I assume we'll need to make a new one? Any suggestions for what this new type should be? APIBased?

spencerschrock · 2023-04-03T21:37:33Z

This change will likely require converting Dangerous-Workflow away from a file content only check:

Looks like CommitBased / FileBased are the only 2 types at the moment.

scorecard/checker/check_request.go

Lines 40 to 45 in eb57e04

const (

// FileBased request types require checks to run solely on file-content.

FileBased RequestType = iota

// CommitBased request types require checks to run on non-HEAD commit content.

CommitBased

)

I assume we'll need to make a new one? Any suggestions for what this new type should be? APIBased?

I meant more along the lines of removing FileBased from the Dangerous-Workflow check as it now requires the API call. I don't think we need an APIBased as every check is assumed to be API based unless FileBased is declared.

scorecard/checks/dangerous_workflow.go

Lines 28 to 32 in b6362b1

    
           func init() { 
        
           	supportedRequestTypes := []checker.RequestType{ 
        
           		checker.FileBased, 
        
           		checker.CommitBased, 
        
           	}

Which is a little unfortunate, as there's still plenty of functionality which works on the file content. Scorecard would benefit from some sort of "optional API call" as once a check removes support for FileBased it can't be run locally. But that's outside the scope of this PR

go run main.go --local=. --checks Dangerous-Workflow
Error: GetEnabled: internal error: Unsupported RequestType [0] by check: Dangerous-Workflow
2023/04/03 14:35:00 error during command execution: GetEnabled: internal error: Unsupported RequestType [0] by check: Dangerous-Workflow

spencerschrock · 2023-04-03T21:42:07Z

For the repo client - good to know that the handlers are stateful. I think we could probably get a fresh client if we just refactor some of the client initialization from here -

scorecard/clients/githubrepo/client.go

Lines 265 to 309 in b6362b1

return &Client{

ctx: ctx,

repoClient: client,

graphClient: &graphqlHandler{

client: graphClient,

},

contributors: &contributorsHandler{

ghClient: client,

},

branches: &branchesHandler{

ghClient: client,

graphClient: graphClient,

},

releases: &releasesHandler{

client: client,

},

workflows: &workflowsHandler{

client: client,

},

checkruns: &checkrunsHandler{

client: client,

graphClient: graphClient,

},

statuses: &statusesHandler{

client: client,

},

search: &searchHandler{

ghClient: client,

},

searchCommits: &searchCommitsHandler{

ghClient: client,

},

webhook: &webhookHandler{

ghClient: client,

},

languages: &languagesHandler{

ghclient: client,

},

licenses: &licensesHandler{

ghclient: client,

},

tarball: tarballHandler{

httpClient: httpClient,

},

}

The main piece we want to reuse is the authed client, which we can pick out from the client struct. I'll take a pass at it. (also open to other ideas)

Agree that extracting the auth'd transport is the main bit we care about. I'll take a look at any follow-up commits later.

We can always revisit the optional init() comment if performance becomes a problem. The tarball handler's init function for example downloads the tarball and extracts it to a /tmp directory. So all of these subclients would be doing extra work when we really just care about is making an API call using the auth from the main client.

spencerschrock · 2023-04-03T21:43:31Z

Disregard about the tarball handler, it's loaded lazily so shouldn't be a problem

We need to remove this because we need to make API calls to verify commit reachability for imposter commits. We may want to look into ways to break this up so that the pieces that don't need API access can still run locally. Signed-off-by: Billy Lynch <billy@chainguard.dev>

spencerschrock

The NewClient approach looks good to me.

I've left some comments on the ref parsing, with regard to re-usable workflows (which should be vulnerable to imposter commits) as well as filtering out non commit SHA refs.

And finally, the increased quota would likely be a problem for running this check in the cron so want to get some input from @azeemshaikh38 on that one.

spencerschrock · 2023-04-04T16:54:08Z

checks/dangerous_workflow.go

-	supportedRequestTypes := []checker.RequestType{
-		checker.FileBased,
-		checker.CommitBased,
-	}
-	if err := registerCheck(CheckDangerousWorkflow, DangerousWorkflow, supportedRequestTypes); err != nil {


The check is still checker.CommitBased in my opinion

spencerschrock · 2023-04-04T18:12:57Z

checks/raw/dangerous_workflow.go

+
+	// If not, query subrepo for commit reachability.
+	// Make new client for referenced repo.
+	subclient, err := c.client.NewClient(repo, "", 0)


commitSHA probably shouldn't be "" here, perhaps clients.HeadSHA?

spencerschrock · 2023-04-04T18:15:17Z

clients/repo_client.go

@@ -52,6 +53,7 @@ type RepoClient interface {
 	ListStatuses(ref string) ([]Status, error)
 	ListWebhooks() ([]Webhook, error)
 	ListProgrammingLanguages() ([]Language, error)
+	ContainsRevision(base, target string) (bool, error)


if clients are initialized at a given SHA:
e.g. InitRepo and NewClient take in a commitSHA arg, do we need base as an arg to ContainsRevision?

spencerschrock · 2023-04-04T18:16:50Z

e2e/dangerous_workflow_test.go

+			for _, repo := range []string{
+				"http://github.com/actions/checkout",
+				"http://github.com/ossf-tests/scorecard-check-dangerous-workflow-e2e",
+			} {
+				_, e := git.PlainClone(tmpDir, false, &git.CloneOptions{
+					URL: repo,
+				})
+				Expect(e).Should(BeNil())
+			}


these break since the check is no-longer FileBased

spencerschrock · 2023-04-04T18:19:58Z

checks/raw/dangerous_workflow.go

+	pdata *checker.DangerousWorkflowData,
+) error {
+	ctx := context.TODO()
+	cache := &containsCache{


from the perspective of the cron, it would be nice for the cron if the cache persisted between calls. I'm curious how much overlap there would be.

Not sure what the best approach would be

cc @azeemshaikh38

for reference, running just the Dangerous-Workflow check on ossf/scorecard consumes 93 core REST quota.

39 for urllib3/urllib3
73 for tensorflow/tensorflow

It's going to be very repo-dependent, based on their CI.

spencerschrock · 2023-04-04T18:34:19Z

checks/raw/dangerous_workflow.go

+				return sce.WithMessage(sce.ErrorCheckRuntime, fmt.Sprintf("unexpected repo reference: %s", s[0]))
+			}
+			repo := strings.Join(repoSplit[:2], "/")
+			sha := s[1]


Can we do a check here for if the SHA is actually a sha? Because this could be a tag too? which wouldn't need the imposter commit verification.

The branch protection check uses something like this to check for it:

// as a package level variable var reCommitSHA = regexp.MustCompile("^[a-f0-9]{40}$") ... // when testing if !reCommitSHA.MatchString(foo) { continue }

would also need to force sha to lowercase, or account for it in the regex

spencerschrock · 2023-04-04T18:58:02Z

checks/raw/dangerous_workflow.go

+	for _, job := range workflow.Jobs {
+		for _, step := range job.Steps {


This doesn't cover re-usable workflows ( i assume they are also vulnerable). Would it make sense to loop over Uses.Values from both jobs (job.WorkflowCall.Uses.Value) and steps to get them into a slice? And then doing the same analysis over everything in the slice?

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_iduses

I was trying to check how the code handled this job and it wasn't being checked:

scorecard/.github/workflows/goreleaser.yaml

Lines 69 to 75 in b6362b1

provenance:

needs: [goreleaser]

permissions:

actions: read # To read the workflow path.

id-token: write # To sign the provenance.

contents: write # To add assets to a release.

uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v1.4.0

Reusable workflows can't be hash pinned. Would be interesting to check whether the imposter commit holds true for reusable workflows too.

Taken from the docs:

{ref} can be a SHA, a release tag, or a branch name.

And an experiment I did last year for a different reason
https://github.com/spencerschrock/reusable-workflow-caller/blob/aa0ccd0b1d5255d79a7ba32fd729a1db93d2f124/.github/workflows/scorecard.yml#L30

azeemshaikh38

Some high-level comments about code architecture:

We could avoid changes to the RepoClient interface altogether and instead inject a new dependency in RunScorecard fn.
For the cron job, we can later optimize this injected dependency with a more API efficient model (like a blobstore-based cache maybe) while regular CLI continues to use REST API.
Flag-guard this change so that it does not get rolled out to prod without some testing on our end.

azeemshaikh38 · 2023-04-06T16:05:10Z

checks/dangerous_workflow.go

-	supportedRequestTypes := []checker.RequestType{
-		checker.FileBased,
-		checker.CommitBased,
-	}
-	if err := registerCheck(CheckDangerousWorkflow, DangerousWorkflow, supportedRequestTypes); err != nil {


azeemshaikh38 · 2023-04-06T16:21:45Z

clients/repo_client.go

@@ -30,6 +30,7 @@ const HeadSHA = "HEAD"
 // RepoClient interface is used by Scorecard checks to access a repo.
 type RepoClient interface {
 	InitRepo(repo Repo, commitSHA string, commitDepth int) error
+	NewClient(repo string, commitSHA string, commitDepth int) (RepoClient, error)


I'm not sure I understand the usecase for NewClient API. Could we not use CreateGitHubRepoClient instead?

azeemshaikh38 · 2023-04-06T16:29:15Z

checks/raw/dangerous_workflow.go

+	for _, job := range workflow.Jobs {
+		for _, step := range job.Steps {


Reusable workflows can't be hash pinned. Would be interesting to check whether the imposter commit holds true for reusable workflows too.

github-actions · 2023-04-17T01:53:11Z

Stale pull request message

wlynch requested review from azeemshaikh38, justaugustus, laurentsimon, naveensrinivasan, spencerschrock and raghavkaul as code owners March 24, 2023 20:45

wlynch force-pushed the imposter-commits branch from c505c5d to dda3d4d Compare March 24, 2023 21:01

wlynch had a problem deploying to integration-test March 24, 2023 21:22 — with GitHub Actions Failure

spencerschrock requested changes Mar 24, 2023

View reviewed changes

wlynch added 4 commits April 3, 2023 17:23

Restrict reachability lookup for imposter commits to HEAD.

8abd4c5

This limits the number of calls made instead of probing every branch. Signed-off-by: Billy Lynch <billy@chainguard.dev>

Fix lint errors, and e2e tests.

d0f39cc

Signed-off-by: Billy Lynch <billy@chainguard.dev>

DangerousWorkflow: Fix repo parsing for refs with subpaths.

dd48285

Signed-off-by: Billy Lynch <billy@chainguard.dev>

wlynch force-pushed the imposter-commits branch 2 times, most recently from 678ed46 to b807b44 Compare April 3, 2023 21:25

GitHub/GitLab NewClient: create new handlers for new client instances.

bd350bf

Signed-off-by: Billy Lynch <billy@chainguard.dev>

wlynch force-pushed the imposter-commits branch from b807b44 to bd350bf Compare April 3, 2023 21:27

wlynch requested a review from spencerschrock April 3, 2023 21:46

wlynch had a problem deploying to integration-test April 4, 2023 17:09 — with GitHub Actions Failure

spencerschrock reviewed Apr 4, 2023

View reviewed changes

azeemshaikh38 reviewed Apr 6, 2023

View reviewed changes

github-actions bot added the no-pr-activity label Apr 17, 2023

github-actions bot closed this May 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add DangerousWorkflow check for imposter commits. #2789

✨ Add DangerousWorkflow check for imposter commits. #2789

wlynch commented Mar 24, 2023

codecov bot commented Mar 24, 2023 •

edited

Loading

spencerschrock left a comment

spencerschrock Mar 24, 2023

wlynch Apr 3, 2023

spencerschrock commented Mar 24, 2023

spencerschrock commented Mar 24, 2023

wlynch commented Apr 3, 2023

wlynch commented Apr 3, 2023

spencerschrock commented Apr 3, 2023 •

edited

Loading

spencerschrock commented Apr 3, 2023

spencerschrock commented Apr 3, 2023 •

edited

Loading

spencerschrock left a comment

spencerschrock Apr 4, 2023 •

edited

Loading

azeemshaikh38 Apr 6, 2023

spencerschrock Apr 4, 2023

spencerschrock Apr 4, 2023

spencerschrock Apr 4, 2023

spencerschrock Apr 4, 2023

spencerschrock Apr 4, 2023

spencerschrock Apr 4, 2023

spencerschrock Apr 4, 2023

spencerschrock Apr 4, 2023

azeemshaikh38 Apr 6, 2023

spencerschrock Apr 6, 2023

azeemshaikh38 left a comment

azeemshaikh38 Apr 6, 2023

azeemshaikh38 Apr 6, 2023

azeemshaikh38 Apr 6, 2023

github-actions bot commented Apr 17, 2023

	func init() {
	supportedRequestTypes := []checker.RequestType{
	checker.FileBased,
	checker.CommitBased,
	}

		for _, job := range workflow.Jobs {
		for _, step := range job.Steps {

	provenance:
	needs: [goreleaser]
	permissions:
	actions: read # To read the workflow path.
	id-token: write # To sign the provenance.
	contents: write # To add assets to a release.
	uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v1.4.0

✨ Add DangerousWorkflow check for imposter commits. #2789

✨ Add DangerousWorkflow check for imposter commits. #2789

Conversation

wlynch commented Mar 24, 2023

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior (if this is a feature change)?**

Which issue(s) this PR fixes

Special notes for your reviewer

Does this PR introduce a user-facing change?

codecov bot commented Mar 24, 2023 • edited Loading

Codecov Report

spencerschrock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spencerschrock commented Mar 24, 2023

spencerschrock commented Mar 24, 2023

wlynch commented Apr 3, 2023

wlynch commented Apr 3, 2023

spencerschrock commented Apr 3, 2023 • edited Loading

spencerschrock commented Apr 3, 2023

spencerschrock commented Apr 3, 2023 • edited Loading

spencerschrock left a comment

Choose a reason for hiding this comment

spencerschrock Apr 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azeemshaikh38 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Apr 17, 2023

codecov bot commented Mar 24, 2023 •

edited

Loading

spencerschrock commented Apr 3, 2023 •

edited

Loading

spencerschrock commented Apr 3, 2023 •

edited

Loading

spencerschrock Apr 4, 2023 •

edited

Loading